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use by the file system of the computer, and is used to 
store replicated data of other ones of the plurality of 
computer entities. In the event of failure of any one of 
the data storage devices, data can be recovered from 
the second data storage areas of the other data storage 
devices. 
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Description 

Field of the Invention 

[0001] The present invention relates to the field of da- 
ta storage in computers, and particularly, although not 
exclusively, to a plurality of networked computers stor- 
ing data on internal non-volatile memory devices. 

Background to the Invention 

[0002] Conventionally, corporations using a plurality 
of computers, for example a plurality of networked per- 
sonal computers (PCs) or Macintosh ©type computers, 
make backup copies of data on a networked system to 
guard against loss of data caused by computer or disk 
drive failure, or by loss of computers or disk drives. 
There are many known types of back up hardware sys- 
tems, and conventionally these fall into 3 broad catego- 
ries termed on-line, near-line and off-line backup sys- 
tems. 

[0003] On-line backup systems are aimed at backing 
up data lost due to failure of parts of computer networks, 
where the backup procedure can be initiated almost im- 
mediately, once the loss of data is discovered. On-line 
backup systems form an integral part of a computer net- 
work, and includes such systems as a redundant server 
which mirrors the data in a main server, and which is 
connected over a same local area network as the main 
server. On-line systems, particularly for small compa- 
nies, do not protect against catastrophic events such as 
a fire destroying all the computer equipment, or theft of 
all computer equipment in a network. However, they pro- 
vide relatively fast recovery times from equipment fail- 
ure. 

[0004] Near- line systems involve storage of data on 
devices having lower response times than on-line sys- 
tems in the event of data loss. Typically, a near-line sys- 
tem may comprise a CD ROM cassette system, or a 
tape-spool system, where the CD ROMs and tapes are 
removable from a drive. Large volumes of CD ROMs or 
tapes may be stored within a same building as the com- 
puter network, and which are readily available in the 
event of data loss. 

[0005] Off-line systems include backup to data stor- 
age devices which are removed from the physical loca- 
tion of the network, for example stored a few miles away, 
in the event of a catastrophic failure of the network, e. 
g. theft of all computers, or destruction of all computers 
by fire, off-line systems provide the means to recover 
data. Off-line systems typically have delay times in re- 
storing backup data which are greater than near-line 
systems. 

[0006] There are a wide variety of legacy backup sys- 
tems in use, however many corporations run computer 
networks which, in practice, have shortfalls in backup 
procedures and which leave companies vulnerable to 
loss of data. Many corporations are without on-line, 



near-line or off-line backup facilities, or have gaps in 
their backup coverage having on ly on-line or off-line and 
no near-line facilities, or on-line facilities only with no off- 
line facilities for example. 

5 [0007] In the PC market, recently the data capacity of 
disk drives sold within PCs has increased to levels at 
which many users have large volumes of spare non-vol- 
atile memory available, which exceeds their local PC da- 
ta storage requirements. For example, in a system of 

10 networked personal computers running on a Unix or 
Windows NT® operating system, and communicating 
with the file server upon which data is stored, individual 
PCs may have unused non-volatile data storage capac- 
ities in the range 1 -9 gigabytes per PC. This effectively 

15 represents a computer resource which has been paid 
for, but which remains unused. Whatever the size of 
computer network, having unused non-volatile disk 
space in a network adds to the cost of ownership of a 
network, but provides no benefit to the network owner. 

20 [0008] EP 0854423 teaches of a method for distribut- 
ed data processing using individual platforms intercon- 
nected by acommunication network. The individual plat- 
forms are configured to process, control and store data 
in a distributed manner. In the event of a failure of a par- 

25 ticular platform, the remaining interconnected platforms, 
having shared data of the failed platform distributed 
across their network, process the tasks of this failed plat- 
form. 

[0009] A similar distributed data processing network 
so is found in WO 96/37837 which teaches of a computer 
system potentially capable of data self-repair in the 
event of multiple individual platform failures. This disclo- 
sure is directed to fault tolerance in a database server 
system. 

35 [0010] US 5,586,31 0 is further concerned with distrib- 
uted data processing and is directed to provide a dis- 
tributed processing system configured to update global 
distributed data following a local data update at an indi- 
vidual platform. The disclosure is of a distributed data- 

40 base technology, having take-over of one node's data, 
which resides elsewhere, upon failure of the originating 
node. 

[0011] With reference, in part, to the prior art the in- 
ventors have recognised the need for distributed data 

45 storage utilizing spare non-volatile disk storage devices, 
these devices being non-localised thereby forming a 
distributed storage capacity. In particular, the inventors 
recognise a need for a management utility forming part 
of the distributed data storage system, whereby the 

50 management utility is capable of performing a variety of 
functions. In particular, the setting up of the distributed 
data network, the selecting of individual computer enti- 
ties to participate in the network, and the sizing and di- 
viding of individual non-volatile data storage devices in 

55 order to optimise data storage and recovery. Such a 
management utility not being found in the art. 
[0012] The inventors have recognized that spare non- 
volatile disk storage capacity on individual computers in 
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a network represents an unused resource which by 
putting the unused disk space to use in providing a data 
backup facility can be used to reduce the overall cost of 
ownership of a network and reduce the cost of owner- 
ship of each unit of computing capability provided by a 
network. 

Summary of the Invention 

[001 3] One object of the present invention is to utilize 
unused non-volatile data storage space on individual 
computers in a network of computers, for the purpose 
of data protection. For any individual computer, a non- 
volatile memory storage device, such as hard disk drive, 
is divided into a first area, which is available for use by 
the computer for storage of applications, user data, ex- 
ecutable files and the like, and a second data storage 
area which is useable for storing backup data of one or 
more user data areas of a plurality of other non-volatile 
memory devices in a plurality of other computers in a 
network. 

[0014] In the majority of prior art computer networks 
comprising a plurality of prior art computer entities, there 
exists unused non-volatile data storage area on hard 
disk drives which will never be used. This represents a 
resource which has been paid for by a customer, but 
which gives no benefit to the user. Specific implemen- 
tations of the present invention aim to put this unused 
resource, which has to be paid for whether used or not, 
to better use in enabling a fast on-line data recovery in 
the event of corruption of data on at feast one of the non- 
volatile data storage devices in a computer network. Im- 
plementation of the invention may provide a distributed 
data storage and recovery network having a systems 
manager utility which allows a user to search and select 
individual computer platforms to participate in the as- 
sembled global distributed data system. The manager 
further providing a user with the facility to configure the 
selected individual computer entities, with particular ref- 
erence to the sizing and dividing of their non-volatile da- 
ta storage devices. Specific implementations according 
to the invention herein may be implemented as an alter- 
native or a conventional off-line or near-line back up sys- 
tem, depending upon the requirements of the owner of 
the computer network. 

[0015] In one specific embodiment of the present in- 
vention, comprising a number N data storage devices, 
data from N-1 of the devices can be backed up from a 
remaining one data storage device. 
[001 6] According to a first aspect of the present inven- 
tion there is provided a network of computers charac- 
terized by comprising: 

a plurality of individual computer devices each hav- 
ing a non-volatile data storage device and each 
having means (305) for communicating with at least 
one other one of said plurality of computers; 
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each said non-volatile data storage device being di- 
vided into a first data storage area reserved for use 
by the corresponding computer device, and a sec- 
ond data storage area reserved for backup storage 
of data contained in at least one said first data stor- 
age area of at least one other said non-volatile data 
storage device; 

a data protection component for providing data pro- 
tection for user data stored on each of the non vol- 
atile data storage devices by storing the user data 
which is resident within the said first data storage 
area of each non volatile data storage device, in one 
or a plurality of said second data storage areas of 
other said non volatile data storage devices. 

[0017] According to a second aspect of the present 
invention there is provided a computer entity character- 
ized by comprising: 

at least one data processor; 

at least one non volatile data storage device being 
divided into a first data storage area as a reserve 
for use by data of said computer entity, and a sec- 
ond data storage area reserved for storage of data 
contained in at least one other computer entity; and 

a data protection component for providing data pro- 
tection for user data of said at least one other com- 
puter entity. 



[0018] According to a third aspect of the present in- 
vention there is provided a method of data protection in 

35 a network of computer entities characterized by com- 
prising a plurality of individual computer entities, each 
having a data processor, and at least one non-volatile 
data storage device, and each having means for com- 
municating with at least one other of said plurality of 

40 computer entities, said method comprising the steps of: 

for each said computer entity; 



dividing a said non-volatile data storage device of 
said computer entity into a first data storage area, 
and a second data storage area; 

assigning said first data storage area for use in stor- 
ing data for the operation of a corresponding said 
respective said data processor; and 

assigning said second data storage are a for storage 
of data by at least one other said computer entity. 



55 [001 9] According to a fourth aspect of the present in- 
vention there is provided a method of data protection in 
a network of computer entities, characterized by each 
said computer entity comprising at least one data proc- 
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essor and at least one non-volatile data storage device; 

each said non-volatile data storage device being di- 
vided into a first data storage area dedicated for use 
by a said corresponding respective computer entity, 5 
and a second data storage area dedicated for use 
in storing data of at least one other one of said plu- 
rality of computer entities; 

said method comprising the steps of: 10 

searching said network of computer entities to 
find at least one non-volatile data storage de- 
vice and selecting individual ones of said at 
least one non-volatile data storage devices; 15 
and 

copying data stored in a first said data storage 
area of a first said no n -volatile data storage de- 
vice into a second said data storage area of a 20 
second said non-volatile data storage device. 

[0020] According to a f ifth aspect of the present inven- 
tion there is provided a method of data protection in a 
computer entity characterized by comprising at least 25 
one data processor, at least one non-volatile data stor- 
age device, and a network port, said data storage device 
being divided into a first data storage area dedicated for 
use by said processor, and a second data storage area 
dedicated for use in storing data unrelated to said proc- 30 
essor: 

said method comprising the steps of: 

finding a plurality of other non-volatile data storage 
devices of other computer entities and selecting in- 35 
dividual other said non-volatile data storage devic- 
es; 

receiving via said network port a said data unrelated 

to said processor; and 40 

storing said received data in said second data stor- 
age area of said non-volatile data storage device. 

Brief Description of the Drawings 45 

[0021] For a better understanding of the invention and 
to show how the same may be carried into effect, there 
will now be described by way of example only, specific 
embodiments, methods and processes according to the so 
present invention with reference to the accompanying 
drawings in which: 

Fig. 1 illustrates schematically a prior art network of 
computer entities including a file server having an 55 
off-line data storage device; 

Fig. 2 illustrates schematically a plurality of perma- 



nently unused data storage areas of the plurality of 
computer entities in the prior art network; 

Fig. 3 illustrates schematically a network of compu- 
ter entities according to a specific implementation 
of the present invention, in which means are provid- 
ed for utilizing a plurality of unused data areas on a 
plurality of computer entities in the network; 

Rg. 4 illustrates schematically an architecture of a 
data protection manager module according to first 
specific embodiment of the present invention; 

Fig. 5 illustrates schematically a plurality of no n- vol- 
atile data storage devices divided into first and sec- 
ond data storage areas according to a specific 
method of the present invention; 

Fig. 6 illustrates schematically a first mode of oper- 
ation of a computer network according to a first spe- 
cific implementation of the present invention; 

Fig. 7 illustrates schematically a second mode of 
operation, being a differential backup mode, ac- 
cording to the first specific implementation of the 
present invention; 

Fig. 8 illustrates schematically a third mode of op- 
eration, being an on-line backup mode of the first 
specific implementation of the present invention; 

Fig. 9 illustrates schematically an undivided data 
storage area of a non-volatile data storage device 
containing data files distributed throughout the 
whole of the data storage area in non -contiguous 
fashion; 

Rg, 1 0 illustrates schematically a divided data stor- 
age area comprising a first data storage area re- 
served for use by a processor of a same computer 
entity as the data storage device, and a second data 
storage area reserved for use by other computer 
entities; 

Rg. 11 illustrates schematically a method for parti- 
tioning a data storage area of a non-volatile data 
storage device according to a second specific meth- 
od of the present invention; 

Rg. 12 illustrates schematically a set up method for 
setting up a computer network to operate a data 
protection method; 

Fig. 1 3 illustrates schematically a user interface dis- 
play for finding and selecting computer entities as 
part of the set up method shown in Fig. 12; 

Fig. 1 4 illustrates schematically a user interface dis- 
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play produced during the set up method of Fig. 12 
herein; 

Fig. 15 illustrates schematically a second set up 
procedure for setting up a second data protection s 
method according to a second specific implemen- 
tation of the present invention; and 

Fig. 16 illustrates schematically a set up option of 
the second set up method shown in Fig. 1 5. 10 

Detailed Description of the Best Mode for Carrying 
Out the invention 

[0022] There will now be described by way of example 15 
the best mode contemplated by the inventors for carry- 
ing out the invention. In the following description numer- 
ous specific details are set forth in order to provide a 
thorough understanding of the present invention. It will 
be apparent however, to one skilled in the art, that the 20 
present invention may be practiced without limitation to 
these specific details. In other instances, well known 
methods and structures have not been described in de- 
tail so as not to unnecessarily obscure the present in- 
vention. 25 
[0023] In this specification, by the term 'data storage 
device', it is meant a data storage device which is seen 
by a processor to be a single logical data storage entity. 
Examples of data storage devices include: a single ro- 
tating hard disk drive; a raid array comprising a plurality 30 
of hard disk drives; a magnetic random access memory 
device; or the like. The term 'non-volatile data storage 
device' shall be interpreted accordingly. 
[0024] In this specification, the term 'computer entity' 
refers to at least one data processor and at least one 35 
data storage device operating as a single logical data 
processing entity, wherein the at least one data storage 
device has a data storage area dedicated for storage of 
files used by the processor(s), for their normal opera- 
tion, and which is inaccessible to other processors out- 40 
side the computer entity except via the processor(s) of 
the computer entity. A single computer entity will usually 
be contained in its own discrete housing and may be 
shipped or transported as a whole unit within its single 
housing. 45 
[0025] Referring to Fig. 1 herein, there is illustrated 
schematically part of a prior art network of computers 
comprising a plurality of computers, for example person- 
al computers 100-102, communicating with each other 
over a local area network 104; and a known file server so 
device 105. Each of the network computers 100-102 
have a non-volatile hard disk data storage device upon 
which are stored applications and local configurations 
for the computer. The file server 105 stores data files 
which are accessed by the computers, and is provided ss 
with a backup facility, for example a known DDS format 
tape drive 106. A known approach to data backup is to 
copy all data, signified by shaded data areas 203-205 



from the hard drive disks of the network computers onto 
a backup device such as a DDS format tape device 206 
attached to a server, either in an internal bay or on an 
external connection to that server. Alternatively, or ad- 
ditionally, data can be backed up onto an on-line data 
storage system such as the Auto Backup product of 
Hewlett Packard Company, which comprises a plurality 
of non-volatile hard disk devices. 
[0026] Referring to Fig. 2 there is shown logically the 
example prior art computer network of Fig. 1 herein. 
Each conventional computer has a non-volatile hard 
disk data storage device 200-202 respectfully. For each 
hard disk, a proportion of the disk is likely to remain un- 
used. 

[0027] Referring to Fig. 3 herein, there is shown sche- 
matically a network of computer entities modified to em- 
body and operate according to a specific implementa- 
tion of the present invention. Each computer entity com- 
prises a plurality of application programs 300; an oper- 
ating system 301 ; a user interface 302 including a key- 
board, a pointing device such as a mouse or trackball, 
and a visual display unit; at least one data processor 
303; an amount of memory 304 including volatile mem- 
ory and a non-volatile memory device, for example a ro- 
tating hard disk drive; a communications port 305 for 
communicating with other computers in a network 
across a local area network 306; and a data protection 
management module 307. A computer entity may com- 
prise a network attached storage device (NAS), which 
may not necessarily have attached keyboards, pointing 
devices and visual display devices. 
[0028] It will be understood by these skilled in the art 
that variations of processor, peripheral device, user in- 
terface, operating system and applications may be 
present from computer to computer. 
[0029] The data protection manager module compris- 
es code which is stored in at least one said non-volatile 
datastorage device. The data protection manager mod- 
ule 307 operates to provide data protection for data 
stored on each of the non-volatile data storage devices, 
by storing the user data, which is resident within a first 
memory area of each non-volatile data storage devices 
in one or a plurality of second memory areas of other 
non-volatile data storage devices of the plurality of non- 
volatile data storage devices. 

[0030] Referring to Fig. 4 herein, there is illustrated 
schematically an architecture of data protection manag- 
er 307. In a preferred embodiment, data protection man- 
ager 307 is constructed of a plurality of modules, each 
module comprising code capable of operating in con- 
junction with a processor and memory means of a com- 
puter entity, for performing the specific methods as de- 
scribed herein. Data protection manager 307 comprises 
a set up module 400 used for setting up a computer en- 
tity to operate data protection according to methods de- 
scribed herein, the set up module 400 comprising a find 
and select module 401 , for finding a plurality of non-vol- 
atile data storage devices in a network of computer en- 
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titles, and enabling a user to select which of the found 
non -volatile data storage devices will participate in the 
data protection methods described herein; a sizing and 
dividing module 402 for enabling a user to select a size 
of first and second data areas within an individual non- 
volatile data storage device, and divide the available 
memory area into the first and second data storage ar- 
eas for each said non-volatile data storage device; a da- 
ta transfer allocation module 403 for implementing 
transfer and copying of data between individual non-vol- 
atile data storage devices, the data transfer allocation 
module 403 comprising a first transfer algorithm 404 ca- 
pable of operating a fully redundant mode of data pro- 
tection, and a distributed file system (DFS) based algo- 
rithm 405 capable of operating a distributed scaleable 
data transfer method; a backup scheduler 406 for cre- 
ating backup schedules and for activating copying of da- 
ta between first and second data areas at preset times; 
and a user interface generator 407 for generating visual 
displays for scheduling backups, for sizing and dividing 
data storage areas of data storage devices, and for find- 
ing and selecting data storage devices to participate in 
a data protection method as described herein. 
[0031] In the best mode implementation, the data pro- 
tection manager 307 is installed on each of a plurality of 
computer entities in a computer network. 
[0032] There will now be described a first specific 
method of operation of the network of computer entities 
of Fig. 3 according to the present invention. 
[0033] Referring to Fig. 5 herein, there is illustrated 
schematically a logical representation of a plurality of 
non-volatile data storage devices 500-502, for example 
rotating hard disk drive units, within a corresponding re- 
spective plurality of computer entities 503-505. After 
having installed the data protection manager modules 
307 onto each of a plurality of computers 503-505, each 
of the data storage devices 500-502 are partitioned into 
a first storage area 506-508 respectively and a second 
data storage area 509-511 respectively. For each com- 
puter, data, applications programs, an operating system 
and all other data and programs which are necessary 
for normal operation of a computer are consolidated to 
be stored within the first data storage area of the corre- 
sponding respective data storage device. The operating 
system of the computer does not access, for normal op- 
eration of that computer, the second data storage area 
of its non-volatile data storage device, but this is re- 
served for data protection of user data of at least one 
other of the plurality of computers within the network. 
The first data storage areas 506-508 respectively, may 
be pre-selectable by the data protection manager 307 
to reserve a selectable percentage of the overall data 
capacity of the data storage device. For example, where 
a 9 Gbyte drive is installed, one Gbyte of data storage 
space may be reserved as the first data storage area, 
and the operating system, applications, drivers, and us- 
er data for normal operation of the computer may be res- 
ident in that first data storage area. The second data 



storage area may comprise the remaining 8 Gbytes of 
available user data space. 

[0034] For example in a network comprising 9 com- 
puters each having a 9 Gbyte non -volatile data storage 

5 device, pre-configured such that each data storage de- 
vice has a 1 Gbyte first data storage area and an 8 Gbyte 
second data storage area, in a robust first mode of op- 
eration, each data storage device contains backup data 
from the other 8 data storage devices. That is, where 

10 the 9 computers are labeled A-l, the first data storage 
area of the data storage device of first computer A con- 
tains data specific to computer A only, and the second 
data storage area 509 of first computer A contains data 
which is stored in the first data storage areas of the re- 

15 maining 8 computers B-l. Thus, the 9 Gbytes of availa- 
ble data storage area on the non-volatile data storage 
device of first computer A is occupied by the user data 
of first computer A, resident in the first data storage area 
506, and the computer specific user data in first data 

20 storage areas of each of the other 8 computers B-l is 
stored in the second data storage area 509 of the first 
computer A. 

[0035] Similarly, for second computer B, the first data 
storage area 507 of that computer's data storage device 

25 is occupied by data which is specific to second computer 
B, whereas the second data storage area 51 0 of the sec- 
ond computer B is occupied by the computer-specific 
data of first and third to ninth computers A, C-l. Similarly, 
for the third to ninth computers, each computer stores 

30 its own computer specific data, in its own first data stor- 
age area, as well as storing the computer specific data 
of all the other computers in the network in the second 
data storage area of that computer. 
[0036] This mode of operation is robust, since the da- 

35 ta from all 9 computers in the network can be recovered 
from any one computer's data storage device. It will be 
appreciated by those skilled in the art that in a fully ro- 
bust mode of operation, where each computer stores its 
own data and the data of all other computers, the 

40 number of computers which can participate in such a 
system is limited by the size of the data storage device 
in each computer, and the required amount of computer- 
specific data storage area (the first data storage area) 
which is required. 

45 [0037] Within each second data storage area 509-51 1 
the available non volatile storage area may be pre-par- 
titioned, such that a specific set of memory locations are 
reserved for each other computer in the network, so that 
other computers in the network which have a low 

so amount of data actually stored in their first data storage 
areas will still have available in each other computer, a 
partition of size corresponding to the first data storage 
area. 

[0038] Alternatively, the partitioning of the second da- 
55 ta storage area of each data storage device may be al- 
located dynamically and filled up by replication of data 
in the plurality of first data storage areas of the other 
computers in the network as and when required. 
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[0039] Referring to Fig. 6 herein, there are illustrated 
schematically process steps carried out by data protec- 
tion manager 307 for data protection of N selected data 
storage devices. In step 601 , the data manager divides 
the reserved second data storage area into N-1 seg- 
ments. This may be achieved during a setup procedure 
in which a user may select which data storage devices 
participate in the data protection process. For a number 
N participating data storage devices, the data storage 
manager 307 partitions each second data area of each 
of the N participating data storage devices into a number 
N-1 segments. In step 602, for each data storage de- 
vice, each of the N-1 segments are assigned to a cor- 
responding respective first data storage area of each of 
the other ones of the plurality N of data storage devices 
participating in the system. In step 603, it is checked 
whether the data protection backup is initiated. Initiation 
of a data protection backup can be made periodically, 
according to a backup schedule for each of the N par- 
ticipating data storage devices independently, or all oth- 
er the plurality N of data storage devices can be backed 
up simultaneously. In step 604, data in the first data stor- 
age area of a first data storage device is copied onto a 
corresponding segment on each of the other ones of the 
plurality of data storage devices, so that N-1 copies of 
the data in the first data storage area on the first com- 
puter are made. Similarly, for second, third and N data 
storage devices, data in the first data storage area of 
these devices is copied to same data storage areas on 
each of the N-1 other data storage devices. The result 
is that for each first data storage area, N-1 copies of the 
data contained in that first data storage area are made 
in the second data storage areas of the N-1 other data 
storage devices. 

[0040] Referring to Fig. 7 herein, there is illustrated 
schematically process steps for a second mode of op- • 
e ration of data protection manager 307. Transfer algo- 
rithm 404 operates in a differential backup mode when 
activated by backup scheduler 406. In step 700, set up 

. module 400 is used to set up a plurality of computer en- 
tities as illustrated in Figs. 3 and 5 herein as described 
in steps 600 and 601 previously. In step 701, for each 
data storage device, data files which are resident in the 
first data storage area of that device are copied to a cor- 
responding respective partition in each of the plurality 
of N-1 other data storage devices in the selected group 

. of N data storage devices. Each second data storage 
area has N-1 partitions, each partition assigned to a cor- 
responding respective data storage device other than 
the data storage device on which the partition exists. Ei- 
ther single parity or distributed parity may be used 
throughout the plurality of disks in the group. The first 
data storage area is reserved for use of the computer to 
which that data storage device belongs. In step 702 
backup is initiated via back up scheduler 406, either au- 
tomatically, or in response to a user request. In steps 
703 to 707, the transfer algorithm 404 in a differential 
backup mode cycles through each of the plurality N data 



storage devices which have been selected as a backup 
group by a user via set up module 400. In step 703 data 
files in the first data storage area of an N™ data storage 
device of the group is examined. In step 704, each file 
5 in the first data storage area of the N™ data storage 
device is compared with a corresponding file in each of 
the individual partitions within the second data storage 
areas of the remaining N-1 data storage devices. If the 
files in the first data storage area differ from those stored 
io in the second data storage areas in step 705, then in 
step 706 the files in the first data storage area which are 
found to have been changed, that is different to those 
stored in the second data storage areas, are copied to 
each of the second data storage areas of the other data 
15 storage devices in the group. In step 707, the value of 
N is cycled, that is incremented or decremented, to look 
at the next of the N data storage devices in the group. 
The loop 703-707 continues whenever a backup is ini- 
tiated, or periodically, so that differential backups of files 
20 which have changed since a previous backup, are cop- 
ied to the second data storage areas. 
[0041] Referring to Fig. 8 herein, there is illustrated a 
third mode of operation implemented by the transfer al- 
gorithm 404 in the data transfer allocation module 403. 
25 The third mode comprises an on-line mode of data pro- 
tection. Rather than operating the first or second modes 
of operation, that is the full back up differential backup 
modes, which are activated at a specific point in time, 
the third on-line mode operates substantially continu- 
al ously during operation of a network as a background on- 
going data protection process. The process shown in 
Fig. 8 may run independently on each of a plurality of N 
computer entities in a group. In step 800, all file system 
writes occurring to a first data storage area of the N™ 
35 data storage device are examined by the data protection 
manager 307. Whenever a file system write occurs, in 
steps 801 and 802 the write is replicated and sent to 
each of the partitions corresponding to the first data stor- 
age area of the N™ device, the partitions being resident 
40 in the second data area partitions of all other data stor- 
age devices. The steps 800, 801 continue, activated by 
writes to the first data storage area until the on-line back- 
up procedure is stopped by a user entering commands 
through backup scheduler 406. In a network of computer 
^5 entities comprising a group of N computer entities se- 
lected in an on-line backup group, for each computer 
entity, writes to the first data storage area of that com- 
puter activate sending of replicate data writes to all other 
computer entities for storage in the second data storage 
50 areas of the other computer entities. Writes may be sent 
across the network substantially simultaneously and in- 
dependently, by each of the N computer entities in a 
group. 

[0042] Referring to Fig. 9 herein, there is illustrated 
55 schematically as a series of lines, data written to a non- 
volatile data storage device, for example a rotating hard 
disk drive. A data storage area 900 comprising the 
whole of the non-volatile data storage device is occu- 
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pied by individual files designated as lines 901 . Data 
may be written at logical locations which are non-con- 
tiguous within the data storage area. 
[0043] As a prerequisite to dividing a data storage de- 
vice into a first data storage area reserved for use by a 
computer to which the data storage device forms an in- 
tegral part, and a second data storage area reserved for 
use by other computers in a network, existing data on 
the device is consolidated into a set of contiguous ad- 
dresses within a first data area 1001 of the data storage 
device, as illustrated schematically in Fig. 10 herein. 
The data storage device is divided such that the oper- 
ating system of the computer having immediate access 
to the data storage device can only utilize the first data 
storage area 1001 for operations involving data used lo- 
cally by the computer. Storage of the computer's oper- 
ating system, drivers, executable files and local data 
files is made in first data storage area 1001. A logical 
division marker 1002 is made such that the file system 
of the computer does not make accessible to normal use 
any non volatile data storage locations beyond the divi- 
sion marker 1002. The second data storage area 1003 
is reserved for use in storing data of other computers in 
the network. The data storage manager module 307 
controls access to the second data storage area 1003, 
by instructing the processor of the computer to transfer 
data received from the communications port 305 into 
and out of the second data storage area 1003. 
[0044] Size and divide module 402 operates as illus- 
trated schematically in Fig. 11 herein. In step 1100, the 
module determines the location of the current memory 
divider 1 002, to determine the boundary of the first data 
area. In step 1101, the size and divide module 402 finds 
data files in the entire non volatile data storage space 
900 of the data storage device. In step 1102 the module 
402 reads the logical location address of each file, and 
determines a size of each file. In step 1103, the module 
402 rewrites the addresses of all the found files, such 
that those files are placed in contiguous blocks in the 
first data area. This leaves the second data area 1003 
available for use in storage of data of other computers. 
As will be appreciated by those skilled in the art, com- 
puter programs for examining non volatile data storage 
area and rearranging data files in contiguous order are 
available in the art and may be incorporated into the data 
protection manager 307 of the first embodiment. Data 
files are moved from their original physical locations on 
the data storage device to new contiguous blocks of da- 
ta within the first data storage area. The second data 
area is an unused resource as far as the computer's op- 
erating system is concerned. The second data area is 
not used by the file system of the operating system res- 
ident on the computer. 

[0045] Referring to Figs. 12-14, there is illustrated 
schematically a set up procedure for selecting a plurality 
of computer entities to participate in a data protection 
work group, and for selecting the type of data protection 
and the timing of data protection to run within the work- 



group. In step 1200, a user at any of the computer en- 
tities on which the data protection manager 307 is in- 
stalled, having the user interface generator facility 407, 
may use a display generated on a visual display unit of 
5 the computer's user interface to select individual non- 
volatile data storage devices in a computer network. 
Such a display may include a plurality of icons as illus- 
trated in Fig. 13 showing a number of computers net- 
worked together, and displaying icons showing the indi- 
10 vidual non-volatile data storage devices which are as- 
signed to those individual computers. In the example of 
Fig. 13, there are shown 6 different computer entities, 
some of which have more than one non-volatile data 
storage device. 
15 [0046] In step 1 201 , the existing capacity of each lo- 
cated non-volatile data storage device is found. 
[0047] In steps 1 202-1 203, set up module 400 is used 
by a user to find and select a plurality of individual com- 
puter entities having associated data storage devices, 
20 and to define such data storage devices into a data pro- 
tection group in which data from each of the plurality of 
data storage devices in the group is distributed amongst 
the plurality of data storage devices in the group. Exist- 
ing data files on the data storage devices are consoli- 
25 dated to contiguous sets i n the first data storage area of 
the devices in step 1204. 

[0048] In step 1205, for each data storage device, a 
second data area is defined, the second data area being 
reserved for data specific to other data storage devices 
30 in the network, comprising other computer entities. Def- 
inition of the second storage area size restricts the size 
of the first storage area. 

[0049] In step 1 207, a computer entity can be selected 
by a user to initiate the backup procedure, in a data pro- 
35 tection group comprising a plurality of computer entities, 
one computer entity may be selected to control backup 
of all data storage devices in the group. In step 1 208, a 
type of data protection algorithm may be selected for the 
data storage devices in a particular group. A particular 
40 type of data protection algorith m is assig ned to each da- 
ta storage device in step 1 209 following selection in step 
1208. As shown schematically in Fig. 14, computers in 
a network may be divided into different data protection 
groups. For example, computers having drives 1, 2, 3, 
45 6 and 8, where drive 8 is a 20 gigabyte RAID array, are 
included in a same group, operating a distributed file 
system based data protection algorithm as herein after 
described. Computer 4, 5 and 7 comprise a second 
group which may operate according to a fully redundant 
so mode as described herein with reference to Fig. 6. In 
step 1210, a user may program the backup scheduler 
using backup schedule module 406 via user interface 
generator 407. It will be appreciated by those skilled in 
the art, that prior art code is available for scheduling 
55 backups, for example as used in the Hewlett Packard 
Colorado backup scheduler. Backup scheduler 406 may 
comprise a prior art code module, adapted to operate 
within the data protection manager 307. 
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[0050] Whereas the first data protection method and 
apparatus may operate satisfactorily for small clusters 
on computers, or work groups of computers in a larger 
network, the number of data storage devices participat- 
ing in the first method and apparatus are limited by the 5 
data capacity of the non-volatile data storage devices 
and the amount of user data specific to a particular com- 
puter which is stored in a first data area. A more scale- 
able solution is provided by the second data protection 
method described herein, in which data of a plurality of 10 
first data areas is distributed over a plurality of second 
data areas. 

[0051] The second data protection method makes 
use of a distributed file system algorithm module 405. 
[0052] Referring to fig. 15 herein, there is illustrated *5 
schematically a data protection scheme based upon a 
distributed file system. In step 1500, a distributed file 
system is set up. As will be appreciated by those skilled 
in the art, distributed file systems are known in other pri- 
or art environments. A prior art distributed file system 20 
algorithm may be incorporated into the DFS based data 
protection algorithm 405. A group of computer entities 
over which the distributed file system data protection 
method will run over is selected similarly as herein be- 
fore described using a computer selection displayed as 25 
shown in Fig. 1 3 and a drive selected display as shown 
in Fig. 1 4. In step 1 501 , each selected data storage de- 
vice to participate in a data protection group is divided 
into a first and second data storage area similarly as 
herein before described. In the general case, each data 30 
storage device must be configured into first and second 
data storage areas independently, since the data stor- 
age devices may, in practice, be of different capacities 
to each other. For example, one data storage device 
may have a 4 gigabyte capacity and a division of a first 35 
data storage area of 1 gigabyte may be selected and a 
second data storage area of 3 gigabytes. On the other 
hand, a second data storage device of 20 gigabytes ca- 
pacity may be partitioned into a 5 gigabyte first data stor- 
age area and a 1 5 gigabyte second data storage area. 40 
Configuration of each non -volatile data storage device 
may be made by configuring that particular associated 
computer entity locally, or, provided permissions are set 
allowing reconfiguration of the non-volatile data storage 
device from other computer entities, configuration may 45 
be made from a single computer entity, selecting each 
data storage device in the networked system. In step 
1502, each first data storage area is assigned to a cor- 
responding processor, and the first data area is reserved 
for storing data concerned with that particular processor, so 
In step 1 503, each second data storage area is assigned 
to the distributed file system. In step 1504, a degree of 
redundancy for the data protection scheme is specified 
by a user, using the displays generated by user interface 
display generator 407. One option for a degree of re- 55 
dundancy to be created in the data protection scheme, 
which may be selected in step 1 505, is to operate a com- 
munity of computer entities in a similar manner to which 



a redundant array of inexpensive disks (RAID) would be 
operated. If the data protection group comprises a 
number M computer entities, then data of an com- 
puter entity is rewritten across a stripe extending across 
a remaining M-1 computer entities in the group. In one 
embodiment the second data storage space in the M th 
computer entity, is used for storing data parity checks. 
This allows efficient use of the second data storage ar- 
eas. In another embodiment, parity may be distributed 
throughout the disks. These modes of operation has an 
advantage over prior art RAID arrays, in that a prior art 
RAID array may fail as a whole unit (although prior art 
RAID arrays are themselves made of individual compo- 
nent units which are in themselves replaceable). 
[0053] In the present system, each individual comput- 
ing entity is discrete, and unlikely to fail, and two com- 
puter entities will not fail as a single unit together. Whilst 
any individual computer entity or data storage device in 
that entity may fail as a complete unit, it is unlikely that 
all computer entities or two computer entities in a group 
will fail simultaneously. In contrast, a conventional RAID 
array may have a single point of failure caused by its 
reliance on a single processor. Similarly, a conventional 
RAID array is physically present in a single physical box. 
If theft of apparatus occurs, then it is likely that the whole 
physical box will be taken. In contrast, in the present 
implementations, individual computer entities are pro- 
vided in separate discrete individual boxes. A complete 
discrete computer entity may be removed, leaving other 
computer entities in place, and data recovery may still 
be obtained from the remaining computer entities. 
[0054] Prior art distributed file systems are not intend- 
ed for use with data backup. However, the functionality 
of a conventional distributed file system may be utilized 
for distribution of data of one computer entity over a plu- 
rality of other computer entities in a data protection 
group. Configuration of the data protection system de- 
pends upon a user's preference for redundancy. A user 
may select how a community of computer entities share 
their data between their non-volatile data storage devic- 
es. A number of concurrent failures of computer entities 
from which data is still recoverable, may be specified by 
a user by selecting how computer entities share data 
between their data storage devices within the data pro- 
tection group. The network may be expanded by addi- 
tion of a network based non-volatile data storage device, 
for the purposes of expansion and extra data protection. 
[0055] In step 1 506, a user may select a second DFS 
mode of operation, in which the distributed file system 
is requested to hold at least two copies of all data at any 
point in time. For example, in this method, where, for 
example there are computer entities A, B, C and D and 
the data of computer entity A as well as being stored on 
a first data storage area of computer entity A is also 
stored in the second data storage areas of computers 
B and C, and then computer C is removed from the sys- 
tem, the distributed file system detects that data from A 
is now stored only on the first data partition area of A 
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and the second data partition area of computer B, and 
therefore creates another copy of the data of A on a 
fourth computer D. In this system, there are forced to be 
at least two copies of data made available within the 
group of computer entities at any one time. Reallocation 
of data is achieved dynamically under control of the dis- 
tributed file system. 

[0056] Referring to Fig. 16, in step 1506 holding at 
least two copies of all data at any point in time may be 
approached by creating multiple distributed file systems 
across a plurality of datastorage devices in a data pro- 
tection group in step 1 600. This is achieved by creating 
multiple partitions in each second data storage area of 
each of a plurality of data storage devices in step 1 601 . 
The partitions may be of various different sizes, and 
each partition may contribute independently to a differ- 
ent logical distributed file system. Across all computer 
entities, a first level of DFS may run, followed by a sec- 
ond level of DFS configured to a different level of redun- 
dancy, and subsequent layers of DFS, each configured 
according to user selected preference to different levels 
of redundancy by assigning individual partitions to indi- 
vidual ones of a plurality of distributed file systems in 
step 1602. For example, a first distributed file system 
may be configured to stripe across all second data stor- 
age areas (step 1 505). A second distributed file system 
may be configured to back up individual first data stor- 
age areas to specified individual second data storage 
areas (1506). 

[0057] Once the distributed file systems are set up, in 
step 1 507, backup software is loaded. The backup soft- 
ware provides modes of operation including full backup, 
differential backup, and on-line backups as herein be- 
fore described with reference to Figs. 6-8. By virtue of 
the fact that all the computer entities are contributing to 
the distributed file system, any software loaded into the 
distributed file system is immediately visible to ail com- 
puter entities, including the backup software. Therefore, 
the backup software needs only to be loaded into one 
computer entity to be available to all computer entities 
in the group. To improve efficiency of operation of the 
DFS based data protection method, some types of file, 
for example operating system files which are common 
to a plurality of computer entities need only be stored in 
the DFS backup system once, with pointers to individual 
computer entities. 

[0058] The second method recognizes that distribut- 
ed file systems can be used for data protection, which 
is a purpose for which they are not designed for in the 
prior art to achieve benefits of reduced cost of ownership 
of a plurality of computer entities, by reuse of otherwise 
unused non-volatile data storage areas and enabling 
any computer entity within a data protection group se- 
lected by a user, which contributes to a distributed file 
system, to recover their data without h aving to load other 
media, and wait for user initiated commands. 



Claims 

1 . A network of computers characterized by compris- 
ing: 

5 

a plurality of individual computer devices 
(503-305) each having a non-volatile data stor- 
age device (500-502) and each having means 
(305) for communicating with at least one other 
10 one of said plurality of computers; 

each said non-volatile data storage device be- 
ing divided into a first data storage area 
(203-205) reserved for use by the correspond- 
15 ing computer device, and a second data stor- 

age area (509-51 1 ) reserved for backup stor- 
age of data contained in at least one said first 
data storage area of at least one other said non- 
volatile data storage device; 

20 

a data protection component (307) for providing 
data protection for user data stored on each of 
the non volatile data storage devices by storing 
the user data which is resident within the said 
25 first data storage area of each non volatile data 

storage device, in one or a plurality of said sec- 
ond data storage areas of other said non vola- 
tile data storage devices. 

30 2. The network of computers as claimed in claim 1 , 
comprising a setup component (400) for defining 
said data storage devices into a data protection 
group in which data from each of the plurality of data 
storage devices in the group are distributed 

35 amongst the plurality of data storage devices in the 
group. 

3. The network of computers as claimed in claim 1 or 
2, comprising: 

40 a sizer (402) means configured for selecting 

a size of a first and a second data storage area of 
each of said plurality of non-volatile data storage 
devices. 

45 4. The network of computers as claimed in any one of 
the preceding claims, further comprising a search 
means (401 ) for finding a plurality of non volatile da- 
ta storage devices. 

50 5. The network of computers as claimed in any one of 
the preceding claims, further comprising; 

a selection means (401) selecting individual 
ones of said plurality of non volatile data storage 
devices. 

55 

6. The network of computers as claimed in any one of 
the preceding claims, comprising: 

a scheduler means (406) for scheduling cop- 
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ying of data between individual ones of said plurality 
of non-volatile data storage devices. 

7. The network of computers as c]aimed in any one of 
the preceding claims, comprising" s 

a mode selector means for selecting between 
a distributed mode of data copying, in which data of 
each of a plurality of said first data areas is copied 
to a plurality of said second data areas, and a re- 
dundant mode in which data of each said first data 10 
storage area is copied to said second data storage 
areas of all of the other ones of plurality of non-vol- 
atile data storage devices. 

8. A computer entity characterized by comprising: is 

at least one data processor (303); 

at least one non volatile data storage device 
(500) being divided into a first data storage area 20 
(506) as reserved for use by data of said com- 
puter entity, and a second data storage area 
(509) reserved for storage of data contained in 
at least one other computer entity; and 

25 

a data protection component (307) for providing 
data protection for user data of said at least one 
other computer entity. 

9. The computer entity as claimed in claim 8 further 30 
comprising: 

at least one network port (305); 

a data transferor component for sending a copy 35 
of said data stored in said first data storage ar- 
ea to said network ports. 

10. The computer entity as claimed in claim 8 or 9, fur- 
ther comprising: 40 

a search component (401) for finding at least 
one other non volatile data storage device. 

1 1 . The computer entity as claimed in any one of claims 

8 to 10, further comprising: 45 

a selecting component (401) for selecting at 
least one individual one of a plurality of unrelated 
non volatile data storage devices. 

12. The computer entity as claimed in any one of claims 50 
8 to 1 1 , operable to store data relating to a plurality 

of other computing entities in said second data stor- 
age area in a striped distributed format. 

1 3. The computer entity as claimed in any one of claims 55 
8 to 12, wherein said means for storing said re- 
ceived data in said second data storage area oper- 
ates to store a plurality of individual blocks of data 



each relating to a corresponding respective other 
computer entity, in a plurality of partitions of said 
second data storage area, such that data of each 
said other computing entity is stored in a corre- 
sponding respective said partition. 

14. A method of data protection in a network of compu- 
ter entities characterized by comprising a plurality 
of individual computer entities, each having a data 
processor (303), and at least one non-volatile data 
storage device (500-502), and each having means 
for communicating with at least one other of said 
plurality of computer entities, said method compris- 
ing the steps of: 

for each said computer entity; 

dividing a said non-volatile data storage device 
of said computer entity into a first data storage 
area (203-205), and a second data storage ar- 
ea; 

assigning said first data storage area for use in 
storing data for the operation of a correspond- 
ing said respective said data processor (303); 
and 

assigning said second data storage area for 
storage of data by at least one other said com- 
puter entity. 

1 5. The method as claimed in claim 1 4, further compris- 
ing the step of: 

for each said second data storage area; 

partitioning said second data storage area into 
a plurality of partitions; and 

assigning each said partition for storing data 
specific to a corresponding respective other 
one of said plurality of computer entities. 

1 6. A method of data protection in a network of compu- 
ter entities, each said computer entity character- 
ised by comprising at least one data processor and 
at least one non-volatile data storage device 
(500-502); 

each said non-volatile data storage device be- 
ing divided into a first data storage area (203) 
dedicated for use by a said corresponding re- 
spective computer entity, and a second data 
storage area (509) dedicated for use in storing 
data of at least one other one of said plurality 
of computer entities; 

said method comprising the steps of: 
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searching said network of computer enti- 
ties to find at least one non -volatile data 
storage device (500-502) and selecting in- 
dividual ones of said at least one non-vol- 
atile data storage devices (500-502); and 

copying data stored in a first said data stor- 
age area of a first said non-volatile data 
storage device into a second said data 
storage area of a second said non-volatile 
data storage device (604). 

17. The method as claimed in claim 16, wherein 

each said second data storage area is ar- 
ranged into a plurality of partition areas, and each 
partition area of an individual said second datastor- 
age area is assigned to store data of a correspond- 
ing respective other said data storage device. 



area (506) dedicated for use by said processor, and 
a second data storage area (509) dedicated for use 
in storing data unrelated to said processor: 
said method comprising the steps of: 

5 

finding a plurality of other non-volatile data stor- 
age devices of other computer entities and se- 
lecting individual other said non-volatile data 
storage devices; 

10 

receiving via said network port a said data un- 
related to said processor; and 

storing said received data in said second data 
is storage area of said non -volatile data storage 

device. 

22. The method as claimed in claim 21, wherein: 



w 



18. The method as claimed in claim 16 or 17, wherein, 

for each of said plurality of computer entities: 
data stored in a said first data storage area of 
said at least one data storage device of said com- 
puter is replicated and stored in a plurality of second 
data storage areas of a plurality of other said com- 
puter entities within said network. 

19. The method as claimed in any one of claims 16 to 

18, wherein 

each said computer entity writes a write data to 
its corresponding said at least one data storage 
device; 

upon a said computer entity writing a said write 
data, said computer entity sends a copy of said 
write data to at least one other computer entity 
of said plurality of computer entities in said net- 
work (104); and 

said at least one other computer entity stores 
said write data in a second data storage area 
of a said data storage device of said other com- 
puter entity. 

20. The method as claimed in any one of claims 16 to 

19, wherein: 

data stored in a first said data storage area of 
a first computer entity is stored as a stripe in a plu- 
rality of said second data storage areas of a plurality 
of other ones of said computer entities comprising 
said network. 



20 said received data comprises data of a plurality 

of different other computer entities; 

said second data storage area is arranged into 
a plurality of different partitions (1004-1006); 
25 and 

said step of storing said received data compris- 
es: 

storing received data of each of said other 
30 computer entities in a corresponding respective 

said partition. 

23. The method as claimed in claim 21 or 22, wherein: 

said received data comprises incremental 
35 backup data of at least one other computer entity, 
said incremental backup data comprising files 
which have been rewritten to be different on said 
first data storage area compared to at least one cor- 
responding file in a said second data storage area. 

40 

24. The method as claimed in claim 21 , 22 or 23, where- 
in said received data comprises a write data sent 
by at least one other computer entity in response to 
a plurality of write events occurring locally on said 

45 other computer entity. 



50 



21. A method of data protection in a computer entity 
comprising at least one data processor (303), at 55 
least one non-volatile data storage device, and a 
network port (305), characterized by said data 
storage device being divided into a first data storage 
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N participating data storage deviceslf 



Divide each reserved second data^ "~ 60 " 



storage area into [N-1 ] segments 



y^ox each data storage device, assign eacn^ 
segment of second data storage area to a 
^^corresponding respective first data storage area oj 
one of the other data storage devices 
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YES 



For each data storage device, copy N 
data in first data storage of each of plurality of N 
other data storage devices to corresponding segment 
v in second data storage area of data storage 
device. 
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